from sklearn.datasets import fetch_openml
from collections import Counter
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import skew
import numpy as np
from matplotlib import pyplot as plt
import matplotlib.patches as patches
import seaborn as sns
from sklearn.datasets import fetch_openml
from sklearn.linear_model import SGDClassifier
from sklearn.dummy import DummyClassifier
from sklearn.ensemble import RandomForestClassifier
from tabulate import tabulate
from sklearn.model_selection import cross_validate, cross_val_predict
from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve, roc_curve, roc_auc_score
import plotly
plotly.offline.init_notebook_mode()
mnist = fetch_openml('mnist_784', as_frame=False, parser='auto')
print(mnist.DESCR)
**Author**: Yann LeCun, Corinna Cortes, Christopher J.C. Burges **Source**: [MNIST Website](http://yann.lecun.com/exdb/mnist/) - Date unknown **Please cite**: The MNIST database of handwritten digits with 784 features, raw data available at: http://yann.lecun.com/exdb/mnist/. It can be split in a training set of the first 60,000 examples, and a test set of 10,000 examples It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field. With some classification methods (particularly template-based methods, such as SVM and K-nearest neighbors), the error rate improves when the digits are centered by bounding box rather than center of mass. If you do this kind of pre-processing, you should report it in your publications. The MNIST database was constructed from NIST's NIST originally designated SD-3 as their training set and SD-1 as their test set. However, SD-3 is much cleaner and easier to recognize than SD-1. The reason for this can be found on the fact that SD-3 was collected among Census Bureau employees, while SD-1 was collected among high-school students. Drawing sensible conclusions from learning experiments requires that the result be independent of the choice of training set and test among the complete set of samples. Therefore it was necessary to build a new database by mixing NIST's datasets. The MNIST training set is composed of 30,000 patterns from SD-3 and 30,000 patterns from SD-1. Our test set was composed of 5,000 patterns from SD-3 and 5,000 patterns from SD-1. The 60,000 pattern training set contained examples from approximately 250 writers. We made sure that the sets of writers of the training set and test set were disjoint. SD-1 contains 58,527 digit images written by 500 different writers. In contrast to SD-3, where blocks of data from each writer appeared in sequence, the data in SD-1 is scrambled. Writer identities for SD-1 is available and we used this information to unscramble the writers. We then split SD-1 in two: characters written by the first 250 writers went into our new training set. The remaining 250 writers were placed in our test set. Thus we had two sets with nearly 30,000 examples each. The new training set was completed with enough examples from SD-3, starting at pattern # 0, to make a full set of 60,000 training patterns. Similarly, the new test set was completed with SD-3 examples starting at pattern # 35,000 to make a full set with 60,000 test patterns. Only a subset of 10,000 test images (5,000 from SD-1 and 5,000 from SD-3) is available on this site. The full 60,000 sample training set is available. Downloaded from openml.org.
Description :
MNIST ("Modified National Institute of Standards and Technology") is the de facto “hello world” dataset of computer vision. Since its release in 1999, this classic dataset of handwritten images has served as the basis for benchmarking classification algorithms. As new machine learning techniques emerge, MNIST remains a reliable resource for researchers and learners alike.
The MNIST database has handwritten digits with 784 features. It has 60,000 training sample and 10,000 test sample . The digits are standardized to fit in a 28x28 pixel box . The training and test sets include samples from different writers to ensure fair testing.
mnist.data.shape
(70000, 784)
X = mnist.data
y = mnist.target
def plot_digit(image_data):
image = image_data.reshape(28, 28)
plt.imshow(image, cmap="binary")
plt.axis("off")
plot_digit(X[2])
plt.show()
Here is the link of reference : apapiu.github.io
num_instances_per_class = 9
# use this to print digit number
digit_to_visualize = 4
digit_indices = np.where(y == str(digit_to_visualize))[0]
random_indices = np.random.choice(digit_indices, size=num_instances_per_class, replace=False)
plt.figure(figsize=(10, 5))
for i, image_idx in enumerate(random_indices):
image = X[image_idx].reshape(28, 28)
plt.subplot(1, num_instances_per_class, i + 1)
plt.imshow(image, cmap='gray')
plt.axis('off')
plt.suptitle("Variability in Digit '4'", fontsize=20)
plt.tight_layout()
plt.show()
Variability in Digit ‘4’:
The image you provided displays ten different representations of the digit ‘4’ from the MNIST dataset.
Each representation showcases variations in handwriting style and stroke thickness.
These diverse examples highlight the inherent variability in how people write the same digit.
The grayscale images have a white digit on a black background.
The target is represented by {4}. For every pixel in the {28x28} sized image, we have {784} values.
# Assuming y contains the labels
label_counts = np.bincount(y.astype(int))
# Define a color palette for each label
colors = ['skyblue', 'orange', 'green', 'red', 'purple', 'yellow', 'blue', 'brown', 'pink', 'gray']
plt.bar(range(len(label_counts)), label_counts, color=colors)
plt.xlabel('Digit number')
plt.ylabel('Sample counts')
plt.title('Number of Samples stored in each Label')
plt.xticks(range(len(label_counts)))
plt.show()
The histogram represents the distribution of digit labels (0 to 9) in a dataset.
Here are the observed sample counts for each digit label:
Digit 0: ~6000 samples
Digit 1: ~8000 samples
Digit 2: ~7000 samples
Digit 3: ~7000 samples
Digit 4: ~6500 samples
Digit 5: ~6000 samples
Digit 6: ~6500 samples
Digit 7: >7500 but <8000 samples
Digit 8: >7000 but <7500 samples
Digit 9: >6500 but <7000 samples
from sklearn.decomposition import PCA
# Perform PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
# Create a DataFrame for PCA results
pca_df = pd.DataFrame({'First Dimension': X_pca[:, 0], 'Second Dimension': X_pca[:, 1], 'Digit Label': y})
# Define the order of digit labels for proper coloring
digit_label_order = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
# Plot scatter plot using Matplotlib
plt.figure(figsize=(10, 6))
for label in digit_label_order:
plt.scatter(pca_df[pca_df['Digit Label'] == label]['First Dimension'],
pca_df[pca_df['Digit Label'] == label]['Second Dimension'],
label=label,
alpha=0.7)
plt.title('Scatter Plot of MNIST Dataset (PCA)')
plt.xlabel('First Dimension')
plt.ylabel('Second Dimension')
plt.legend(title='Digit Label')
plt.show()
e:\courses_conestoga\CSCN8010-Course\CSCN8010\venv\CSCN8010_classic_ml\Lib\site-packages\IPython\core\pylabtools.py:152: UserWarning: Creating legend with loc="best" can be slow with large amounts of data.
PCA can be used on your MNIST dataset to reduce the dimensionality from 784 features to a lower number, making it easier to visualize and analyze the data while retaining its essential information. which can also be useful to perform tasks such as classification or clustering with this dataset for further use.
import plotly.graph_objects as go
import numpy as np
# Calculate the standard deviation image
std_image = np.std(X, axis=0).reshape(28, 28)
# Create x and y coordinates
a = np.arange(0, 28)
b = np.arange(0, 28)
# Create meshgrid for x and y
a, b = np.meshgrid(a, b)
# Create the 3D surface plot
fig = go.Figure(data=[go.Surface(x=a, y=b, z=std_image, colorscale='hot')])
# Update layout
fig.update_layout(
title='Standard Deviation Image (3D)',
scene=dict(
xaxis_title='X',
yaxis_title='Y',
zaxis_title='Standard Deviation'
)
)
# Show the plot
fig.show()
Extreme Corners Usage: The corners of the image don't have much variation compared to the middle. This suggests that they're not as important for recognizing digits.
Digits Mostly in the Middle: Most digits are found in the middle of the image. This is because there iss more variation (differences) in pixel values there, which is important for distinguishing between different digits.
class_selec = [1, 3, 5, 7]
X_chose = X[np.isin(y.astype(int), class_selec)]
y_chose = y[np.isin(y.astype(int), class_selec)]
if len(X_chose) == len(y_chose):
print("Target labels and data are still in alignment..")
else:
print("The target labels and the data are not aligned..")
Target labels and data are still in alignment..
from sklearn.model_selection import train_test_split
X_train, X_temp, y_train, y_temp = train_test_split(X_chose, y_chose, test_size=0.3, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)
class_sgd = SGDClassifier(random_state=42)
# Print the sizes of each set
print("Train set size:", X_train.shape[0])
print("Validation set size:", X_val.shape[0])
print("Test set size:", X_test.shape[0])
Train set size: 20036 Validation set size: 4294 Test set size: 4294
Train set size This means there are 20,036 samples in the training set. These samples are used to teach the machine learning model.
Validation set size: This indicates there are 4,294 samples in the validation set. We use this set to fine-tune the model's parameters and evaluate its performance during training.
Test set size: There are also 4,294 samples in the test set. This set is used to assess the model's final performance after training, providing an unbiased estimate of its effectiveness on new, unseen data.
from sklearn.metrics import accuracy_score
precision_scores = []
recall_scores = []
f1_scores = []
confusion_matrices = []
accuracy_scores = []
for cls in class_selec:
y_train_bn = (y_train == str(cls))
y_val_bn = (y_val == str(cls))
# Train classifier
class_sgd.fit(X_train, y_train_bn)
# validation set based predictions
preds_val = class_sgd.predict(X_val)
# Calculating confusion matrix
cm = confusion_matrix(y_val_bn, preds_val)
confusion_matrices.append(cm)
accuracy = accuracy_score(y_val_bn, preds_val)
accuracy_scores.append(accuracy)
# Calculating the value of precision, recall, and f1-score
precision_scores.append(precision_score(y_val_bn, preds_val))
recall_scores.append(recall_score(y_val_bn, preds_val))
f1_scores.append(f1_score(y_val_bn, preds_val))
# printing the outputs of Confusion Matrices
print("\nConfusion Matrices:")
for i, cls in enumerate(class_selec):
matrix = confusion_matrices[i]
print(f"\nClass {cls} vs Rest:")
print("Confusion Matrix:")
print(matrix)
print("\n")
# Calculate T.N, F.P, F.N, T.P matrices
true_negatives = matrix[0, 0]
false_positives = matrix[0, 1]
false_negatives = matrix[1, 0]
true_positives = matrix[1, 1]
# Print additional metrics
print(f"True Positives: {true_positives}")
print(f"False Positives: {false_positives}")
print(f"False Negatives: {false_negatives}")
print(f"True Negatives: {true_negatives}")
print(f"Total: {np.sum(matrix)}")
print("\n")
Confusion Matrices: Class 1 vs Rest: Confusion Matrix: [[3093 28] [ 9 1164]] True Positives: 1164 False Positives: 28 False Negatives: 9 True Negatives: 3093 Total: 4294 Class 3 vs Rest: Confusion Matrix: [[2797 402] [ 32 1063]] True Positives: 1063 False Positives: 402 False Negatives: 32 True Negatives: 2797 Total: 4294 Class 5 vs Rest: Confusion Matrix: [[3309 41] [ 95 849]] True Positives: 849 False Positives: 41 False Negatives: 95 True Negatives: 3309 Total: 4294 Class 7 vs Rest: Confusion Matrix: [[3202 10] [ 105 977]] True Positives: 977 False Positives: 10 False Negatives: 105 True Negatives: 3202 Total: 4294
For Class 1:
For Class 3:
For Class 5:
For Class 7:
Conclusion, the classifier does well in finding Classes 1 and 7, moderately well in finding Class 3 and 5, and rarely confuses them with other classes.
import pandas as pd
# Define table data
tb_dt = [["Class", "Precision", "Recall", "F1 Score", "Accuracy"]]
for i, cls in enumerate(class_selec):
tb_dt.append([cls, precision_scores[i], recall_scores[i], f1_scores[i], accuracy_scores[i]])
# Convert to DataFrame
df = pd.DataFrame(tb_dt[1:], columns=tb_dt[0])
# Print the DataFrame
print("\nMetrics Table:")
print(df.head())
Metrics Table: Class Precision Recall F1 Score Accuracy 0 1 0.976510 0.992327 0.984355 0.991383 1 3 0.725597 0.970776 0.830469 0.898929 2 5 0.953933 0.899364 0.925845 0.968328 3 7 0.989868 0.902957 0.944418 0.973218
Output summary : the classifier demonstrates excellent performance for Classes 1 and 7, good performance for Class 5, but comparatively weaker performance for Class 3 due to a higher rate of false positives. Overall, the classifier shows effectiveness in identifying specific classes while maintaining a generally high level of accuracy across multiple classes.
Class 1: The classifier performs exceptionally well for Class 1, with high precision, recall, F1 score, and accuracy. This indicates that it reliably identifies Class 1 instances with very few false positives or false negatives.
Class 3: While the classifier demonstrates high recall for Class 3, suggesting it can identify most Class 3 instances, the precision is relatively lower. This indicates a higher rate of false positives. The F1 score and accuracy for Class 3 are lower compared to Class 1, indicating that the classifier's performance for Class 3 is not as strong.
Class 5: The classifier shows high precision for Class 5, indicating it correctly identifies most Class 5 instances when it predicts them. However, the recall is slightly lower, suggesting some Class 5 instances are missed. Overall, the F1 score and accuracy for Class 5 are high, indicating good performance.
Class 7: Similar to Class 1, the classifier performs very well for Class 7, with high precision, recall, F1 score, and accuracy. This suggests it reliably identifies Class 7 instances with very few false positives or false negatives.
# class selection
chosen_class_label = 5
# Filter dataset for the chosen class vs. all others
X_bn_cl = X.copy()
y_bn_cl = (y == str(chosen_class_label)).astype(int)
# Split the dataset
X_train, X_temp, y_train, y_temp = train_test_split(X_bn_cl, y_bn_cl, test_size=0.3, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)
# Initialize the sgd classifier
class_sgd = SGDClassifier(random_state=42)
# classifier is trained
class_sgd.fit(X_train, y_train)
# Getting scores for the positive class
y_positive_class = class_sgd.decision_function(X_val)
# Calculating the precision and recall
precision, recall, thresholds = precision_recall_curve(y_val, y_positive_class)
# Plotting curve of precision recall
plt.figure(figsize=(7, 6))
plt.plot(recall, precision, linewidth=2, label="Precision/Recall curve", color="#1f77b4") # blue color
# notation is being added here
threshold_index = np.argmax(precision >= 0.8)
plt.plot([recall[threshold_index], recall[threshold_index]], [0., precision[threshold_index]], "k:")
plt.plot([0.0, recall[threshold_index]], [precision[threshold_index], precision[threshold_index]], "k:")
plt.plot([recall[threshold_index]], [precision[threshold_index]], "ko", label=f"Threshold {thresholds[threshold_index]:.2f}")
# Add arrow and text annotation
plt.gca().add_patch(
patches.FancyArrowPatch(
(recall[threshold_index] + 0.05, precision[threshold_index] + 0.01),
(recall[threshold_index] - 0.2, precision[threshold_index] + 0.12),
connectionstyle="arc3,rad=.2",
arrowstyle="Simple, tail_width=1.5, head_width=8, head_length=10",
color="#9467bd" # purple color
)
)
plt.text(recall[threshold_index] - 0.05, precision[threshold_index] + 0.1, "Higher Threshold", color="#2ca02c") # green color
# Set labels, axis limits, grid, and legend
plt.xlabel("Recall", color="#ff7f0e") # orange color
plt.ylabel("Precision", color="#17becf") # cyan color
plt.axis([0, 1, 0, 1])
plt.grid(color="#d9d9d9") # light gray color
plt.legend(loc="lower left")
plt.title("Precision-Recall Curve")
plt.show()
Understanding the Axes:
Curve Behavior:
Threshold Selection:
Numerical Values:
Conclusion
from sklearn.metrics import precision_score, recall_score
import pandas as pd
def evaluate_classification_metrics(true_labels, predicted_scores):
thresholds = sorted(predicted_scores, reverse=True)
metrics_results = []
for threshold in thresholds:
predicted_labels = (predicted_scores >= threshold).astype(int)
precision = precision_score(true_labels, predicted_labels)
recall = recall_score(true_labels, predicted_labels)
metrics_results.append([threshold, precision, recall])
df_metrics_results = pd.DataFrame(metrics_results, columns=['Threshold', 'Precision', 'Recall'])
return df_metrics_results
def find_best_performance_thresholds(df_metrics_results):
best_precision_row = df_metrics_results[df_metrics_results['Precision'] == df_metrics_results['Precision'].max()]
best_recall_row = df_metrics_results[df_metrics_results['Recall'] == df_metrics_results['Recall'].max()]
best_precision_threshold = best_precision_row['Threshold'].values[0]
best_recall_threshold = best_recall_row['Threshold'].values[0]
return best_precision_threshold, best_recall_threshold
def predict_with_optimal_threshold(predicted_scores, threshold):
predicted_labels = (predicted_scores >= threshold).astype(int)
return predicted_labels
def calculate_final_evaluation_metrics(true_labels, predicted_labels):
precision = precision_score(true_labels, predicted_labels)
recall = recall_score(true_labels, predicted_labels)
return precision, recall
predicted_scores_validation = class_sgd.decision_function(X_val)
df_metrics_results_validation = evaluate_classification_metrics(y_val, predicted_scores_validation)
best_precision_threshold_validation, best_recall_threshold_validation = find_best_performance_thresholds(df_metrics_results_validation)
predicted_labels_validation_at_best_precision = predict_with_optimal_threshold(predicted_scores_validation, best_precision_threshold_validation)
predicted_labels_validation_at_best_recall = predict_with_optimal_threshold(predicted_scores_validation, best_recall_threshold_validation)
precision_validation_at_precision, recall_validation_at_precision = calculate_final_evaluation_metrics(y_val, predicted_labels_validation_at_best_precision)
precision_validation_at_recall, recall_validation_at_recall = calculate_final_evaluation_metrics(y_val, predicted_labels_validation_at_best_recall)
# Creating DataFrame for evaluation results
evaluation_results_validation = pd.DataFrame({
'Metric': ['Precision', 'Recall'],
'Best Precision Threshold': [precision_validation_at_precision, recall_validation_at_precision],
'Best Recall Threshold': [precision_validation_at_recall, recall_validation_at_recall],
'Final Chosen Threshold': [best_precision_threshold_validation, best_recall_threshold_validation]
})
# Displaying evaluation results in tabular format using tabulate function
print("\nValidation set evaluation results:")
print(tabulate(evaluation_results_validation, headers='keys', tablefmt='grid'))
Validation set evaluation results: +----+-----------+----------------------------+-------------------------+--------------------------+ | | Metric | Best Precision Threshold | Best Recall Threshold | Final Chosen Threshold | +====+===========+============================+=========================+==========================+ | 0 | Precision | 1 | 0.0924614 | 22437.3 | +----+-----------+----------------------------+-------------------------+--------------------------+ | 1 | Recall | 0.00104932 | 1 | -30018.2 | +----+-----------+----------------------------+-------------------------+--------------------------+
output : The validation results show the best threshold values for precision and recall. Precision, which measures the accuracy of positive predictions, is perfect at a threshold of 1, meaning all positive predictions are correct. Similarly, recall, indicating the model's ability to find all positive instances, is also perfect at a threshold of 1, showing it finds all relevant cases. However, in real situations, we need a balance between precision and recall. So, for practical use, the chosen threshold for precision is 22437.3, and for recall, it's -30018.2. These values help us understand how well the model performs and guide us in picking thresholds for accurate classification tasks.
y_on_test = class_sgd.decision_function(X_test)
df_metrics_results_test = evaluate_classification_metrics(y_test, y_on_test)
best_precision_threshold_test, best_recall_threshold_test = find_best_performance_thresholds(df_metrics_results_test)
y_pred_test_at_best_precision = predict_with_optimal_threshold(y_on_test, best_precision_threshold_test)
y_pred_test_at_best_recall = predict_with_optimal_threshold(y_on_test, best_recall_threshold_test)
precision_test_precision, recall_test_precision = calculate_final_evaluation_metrics(y_test, y_pred_test_at_best_precision)
precision_test_recall, recall_test_recall = calculate_final_evaluation_metrics(y_test, y_pred_test_at_best_recall)
# Creating DataFrame for test results
test_results = pd.DataFrame({
'Metric': ['Precision', 'Recall'],
'Best Precision Threshold': [precision_test_precision, recall_test_precision],
'Best Recall Threshold': [precision_test_recall, recall_test_recall],
'Final Chosen Threshold': [best_precision_threshold_test, best_recall_threshold_test]
})
# Displaying test results in tabular format using tabulate function
print("\nTest set results:")
print(tabulate(test_results, headers='keys', tablefmt='grid'))
Test set results: +----+-----------+----------------------------+-------------------------+--------------------------+ | | Metric | Best Precision Threshold | Best Recall Threshold | Final Chosen Threshold | +====+===========+============================+=========================+==========================+ | 0 | Precision | 1 | 0.093208 | 16438.4 | +----+-----------+----------------------------+-------------------------+--------------------------+ | 1 | Recall | 0.0010395 | 1 | -28545.6 | +----+-----------+----------------------------+-------------------------+--------------------------+
Output: The test results provide the optimal threshold values for precision and recall, essential for assessing the model's performance in real-world scenarios. Precision, indicating the accuracy of positive predictions, achieves its highest level at a threshold of 1, indicating all positive predictions are correct. Similarly, recall, which measures the model's ability to identify all relevant instances, also reaches its peak at a threshold of 1, signifying it finds all relevant cases. However, in practical applications, it's crucial to balance precision and recall. Thus, the final chosen threshold for precision is 16438.4, while for recall, it's -28545.6. These thresholds offer insights into the model's effectiveness and help in selecting suitable thresholds for accurate classification tasks in real-world scenarios.
# Calculating decision function scores and predictions for the test dataset
scores_test = class_sgd.decision_function(X_test)
predictions_test = class_sgd.predict(X_test)
# Calculate metrics results for the test dataset
df_results_test = evaluate_classification_metrics(y_test, scores_test)
# Find the best thresholds for precision and recall
best_precision_threshold, best_recall_threshold = find_best_performance_thresholds(df_results_test)
# Apply the best precision and recall thresholds to get predicted labels
labels_test_at_best_precision = predict_with_optimal_threshold(scores_test, best_precision_threshold)
labels_test_at_best_recall = predict_with_optimal_threshold(scores_test, best_recall_threshold)
# Calculate final metrics for precision and recall
precision_at_best_precision, recall_at_best_precision = calculate_final_evaluation_metrics(y_test, labels_test_at_best_precision)
precision_at_best_recall, recall_at_best_recall = calculate_final_evaluation_metrics(y_test, labels_test_at_best_recall)
# Calculating F1 score and accuracy
f1_scores = (f1_score(y_test, predictions_test))
accuracy = accuracy_score(y_test, predictions_test)
# Creating DataFrame to store the results on the test data
results_test_data = pd.DataFrame({
'Metric': ['F1 Score', 'Accuracy'],
'Chosen Threshold': [best_precision_threshold, best_recall_threshold],
'F1 Score': [f1_scores, None],
'Precision at Best': [precision_at_best_precision, recall_at_best_precision],
'Recall at Best': [precision_at_best_recall, recall_at_best_recall],
'Accuracy': [None, accuracy]
})
# Creating DataFrame to display F1 score and accuracy
f1_accuracy = pd.DataFrame({
'Metrics': ['F1 Score', 'Accuracy'],
'Score': [f1_scores, accuracy]
})
# Displaying the F1 score and accuracy DataFrame
print("\nOutput:")
f1_accuracy
# Displaying the results on the test data DataFrame
results_test_data
Output:
| Metric | Chosen Threshold | F1 Score | Precision at Best | Recall at Best | Accuracy | |
|---|---|---|---|---|---|---|
| 0 | F1 Score | 16438.401625 | 0.757264 | 1.00000 | 0.093208 | NaN |
| 1 | Accuracy | -28545.629775 | NaN | 0.00104 | 1.000000 | 0.948286 |
Conclusion:
Metric:
This column indicates the evaluation metric being measured, which includes "F1 Score" and "Accuracy".
Chosen Threshold:
F1 Score:
Precision at Best:
Recall at Best:
*Recall, also known as sensitivity, measures the proportion of true positive predictions that were correctly identified by the model. A value of 1.0 indicates perfect recall.
Accuracy:
My Synopsis of the entire notebook:
The MNIST dataset is an amazing collection of 70,000 photographs of handwritten numbers that are each 28 by 28 pixels in size.
Examining the Data: Upon delving into the dataset, I observe that the samples are evenly distributed among the digit labels 0-9. While examining the standard deviation image exposes changes in pixel intensity throughout the dataset, analysis provides information on the distribution of pixel intensitiesby using heat map.
Analysing Model Performance: I am impressed by the SGDClassifier's performance during my evaluation process, especially when I concentrate on the subset of digit classes 1, 3, 5, and 7.
Going Deeper: Using confusion matrices to illustrate classification efficacy across the chosen classes, I carry out a comprehensive investigation of the model's performance. In particular, precision-recall curves for the selected classes offer important information about the trade-offs. I can improve precision or recall without retraining by utilising threshold optimisation approaches.
Insights and Optimisation Techniques: I see that the model performs well overall, showing good recall and precision rates in a variety of tests, particularly for the chosen digit classes. I may change the model's behaviour to match particular work requirements.
Final Thoughts: Using the MNIST dataset, my research offers a useful framework for handling classification jobs successfully. I pay particular attention to the digit classes 1, 3, 5, and 7. My comprehension of the dataset and model performance has improved as a result of the newfound insights, which also offer doable tactics for enhancing classification results in this particular situation.